An Information Extraction System for English Ontology Identifier Names

نویسنده

  • Sandra Williams
چکیده

I describe a system, Txt2ids, that uses a series of regular expressions to extract suggestions for ontology identifier names from English text and classify them as (i) class names, (ii) individual names, (iii) object property names, or (iv) data property names. As well as being of practical use as a tool in an ontology authoring system, it also functions as a theoretical model of the syntactic organisation of identifier names. Regular expressions were derived from part-of-speech patterns in identifier names in a corpus of over 500 ontologies. Since ontology identifier names have syntactic structures that differ from natural English, the regular expressions were adapted. Extracted phrases were post-processed to comply with the structure of OWL Simplified English. A system sanity test achieved acceptable results when comparing identifiers extracted by Txt2ids (from texts that had been automatically generated by an ontology verbaliser from a large corpus of ontologies) with the original identifiers from the same corpus. Txt2ids tends to generate greater numbers of identifiers than were present in the original ontology; however, many of the additional ones seem reasonable suggestions. To assist in the design of a future system evaluation, a pilot study was conducted in which identifier names extracted by Txt2ids from short, expository texts compared favourably with those created by human users when building ontologies from the same texts. The system has been deployed in an ontology editor developed for the SWAT project.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Analysis of POS Tag Patterns in Ontology Identifiers and Labels

I describe an analysis of the syntax of identifier names found in a corpus of over 500 ontologies. The analysis was performed in five steps: (i) extraction of identifier names from the corpus; (ii) construction of dummy sentences containing the identifiers; (iii) part-of-speech (POS) tagging; (iv) extraction of POS tag strings; (v) POS string frequency analysis; and (vi) general syntactic patte...

متن کامل

A new nomenclature for fungi

Important changes brought about by the Melbourne International Code of Nomenclature for Algae,FungiandPlantsare briefly reviewed concerning a clarification of the spelling and typification of sanctioned fungal names, the recognition of electronic publication for the validity of nomenclatural novelties, permission to use English diagnoses or descriptions for their valid publication, and the requ...

متن کامل

Ontology-based Normalization for Disease-Lab test Relation Extraction

This poster describes our preliminary work on ontology-based normalization for diseases and lab tests, as a fundamental step toward disease-lab test relation extraction. Multiple ontologies are leveraged for this aim. Specifically, diseases and lab tests are first extracted and mapped to the Concept Unique Identifier (CUI) of the Unified Medical Language System (UMLS) by MetaMap. Codes of Inter...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Style Guidelines for Naming and Labeling Ontologies in the Multilingual Web

In the context of the Semantic Web, natural language descriptions associated with ontologies have proven to be of major importance not only to support ontology developers and adopters, but also to assist in tasks such as ontology mapping, information extraction, or natural language generation. In the state-of-the-art we find some attempts to provide guidelines for URI local names in English, an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013